Improving Keyword Spotting with a Tandem BLSTM-DBN Architecture
نویسندگان
چکیده
We propose a novel architecture for keyword spotting which is composed of a Dynamic Bayesian Network (DBN) and a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net. The DBN uses a hidden garbage variable as well as the concept of switching parents to discriminate between keywords and arbitrary speech. Contextual information is incorporated by a BLSTM network, providing a discrete phoneme prediction feature for the DBN. Together with continuous acoustic features, the discrete BLSTM output is processed by the DBN which detects keywords. Due to the flexible design of our Tandem BLSTM-DBN recognizer, new keywords can be added to the vocabulary without having to re-train the model. Further, our concept does not require the training of an explicit garbage model. Experiments on the TIMIT corpus show that incorporating a BLSTM network into the DBN architecture can increase true positive rates by up to 10 %.
منابع مشابه
A Tandem BLSTM-DBN Architecture for Keyword Spotting with Enhanced Context Modeling
We propose a novel architecture for keyword spotting which is composed of a Dynamic Bayesian Network (DBN) and a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net. The DBN is based on a phoneme recognizer and uses a hidden garbage variable as well as the concept of switching parents to discriminate between keywords and arbitrary speech. Contextual information is incorporated by ...
متن کاملNon-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting
It has been shown in [1, 2] that improved performance can be achieved by formulating the keyword spotting as a non-uniform error automatic speech recognition problem. In this work, we discriminatively train a deep bidirectional long short-term memory (BLSTM) hidden Markov model (HMM) based acoustic model with non-uniform boosted minimum classification error (BMCE) criterion which imposes more s...
متن کاملImproved Bottleneck Feature using Hierarchical Deep Belief Networks for Keyword Spotting in Continues Speech
Bottleneck (BN) feature has attracted considerable attentions by its capacity of improving the accuracies in speech recognition tasks. Recently, researchers have proposed some modified approaches for extracting more effective BN feature, but these approaches still need further improvement. In this paper, motivated by both deep belief networks (DBN) and hierarchical Multilayer Perceptron (MLP), ...
متن کاملKeyword spotting for self-training of BLSTM NN based handwriting recognition systems
The automatic transcription of unconstrained continuous handwritten text requires well trained recognition systems. The semi-supervised paradigm introduces the concept of not only using labeled data but also unlabeled data in the learning process. Unlabeled data can be gathered at little or not cost. Hence it has the potential to reduce the need for labeling training data, a tedious and costly ...
متن کاملLong short-term memory networks for noise robust speech recognition
In this paper we introduce a novel hybrid model architecture for speech recognition and investigate its noise robustness on the Aurora 2 database. Our model is composed of a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net exploiting long-range context information for phoneme prediction and a Dynamic Bayesian Network (DBN) for decoding. The DBN is able to learn pronunciation va...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009